Beaked whale SNP Panel
Intro
An overview of the process for selecting windows for theSNP panel
Genotype likelihoods > Preliminary population structure analysis > Identify windows with high diversity > Include windows that uniquely align with three reference genomes > Remove windows within 50,000 bps of each other > Call genotype likelihoods for a subset of windows
Overview of the datasets and specific insights on known population structure
Z. cavirostris
- Mediterranean, Atlantic and Indo-Pacific clusters strong support- East Mediterranean, West Mediterranean clusters strong support
M. densitrostris
- Indo-Pacific and Atlantic clusters
M. layardii
- No known population structure
M. grayi
- No known population structure
M. eueu
- No known population structure
M. mirus
- No known population structure
M. bidens
- No known population structure
H. ampullatus
- Scotian Shelf (South of Newfoundland) stands out as a cluster
- Canadian Arctic + Jan Mayan, Labrador + Newfoundland form a cluster some support
P. macrocephalus
- Mediterranean, Atlantic clusters strong/some support
B. physalus
- No known N. Atlantic population structure
E. australis
- Some structure between the Indo-Pacific and South Atlantic Oceans (Petenaude et al. 2007)
B. acutorostrata
- West Greenland, Central N. Atlantic-E. Greenland-Jan Mayan, NE Atlantic, North Sea some support (Andersen et al. 2003).
T. truncatus*
*Note that the western North Atlantic ecotype has been elevated to the species T. erebennus.
- Many ecotypes (Black Sea (Tursiops truncatus ponticus), western North Atlantic (T. erebennus), Offshore, Mediterranean) (Moura et al. 2020; Costa et al. 2022)
T. australis
- Form one clade in southern Australia (Moura et al. 2020)
T. aduncus
- Two clades, Indian Ocean, and southwest Australia (Moura et al. 2020)
Genotype likelihood calling
Applied conservative genotype calling filters including a minimum read depth of 5, a minimum base quality score of 30, a minimum read mapping score of 30, a SNP cutoff p value of 1e-6, minor allele frequency of 0.05, included only reads that mapped to one place on the reference genome, only included reads with a mate-pair, removed bad reads (not primary, failure and duplicate reads angsd documentation does not have much on this but it is a default parameter), and considered only autosomal portions of the reference genome.
Calculating contribution to variation in datasets
Wattersons theta was calculated for 200 bp windows along the genome for a subset of 7 datasets.
All (M. bidens, M. densirostris, M. grayi, M. layardii, Z. cavirostris, M. mirus, M. eueu, T. truncatus, T. australis, T. aduncus, E. australis, B. acutorostrata, B. physalus)
M. densitrostris (Indo-Pacific)
M. densirostris Atlantic
T. truncatus (all)
Z. cavirostris (Atlantic and Indo-Pacific excluding the Mediterranean)
H. ampullatus (all)
P. macrocephalus (all)
Filtering windows containing informative SNPs for population structure and species identification
Identifying windows containing SNPs was an interative process, first attempting to pull 200 bp windows of the reference genome that had high diversity among as many reference datasets as possible, and secondly, adding in species specific windows that would be able to inform population structure.
Windows representing multiple species
Pulled windows of 200 bp that passed the quality filters from genotype likelihood calling and checked to see how many were present in all the datasets. Settled on 769 windows that were present in at least 4 of the 7 datasets.
Next, we wanted to see how likely these windows were to be present across other reference genomes and if there were any annotations assigned to them. We extracted the windows from the M. densirostris genome and blasted them against the P. sinus, T. truncatus and M. densirostris genomes, to attempt to consider family bias (porpoises, delphinids and beaked whales). We retained windows only if they had one unique hit in each genome and at least 150 bps aligning to each reference. We further filtered out windows with an evalue less than 10e-4 (that it didnt just align by chance due to a large reference dataset (i.e. the genome) being blasted). This resulted in 654 windows aligning confidently. Next, to factor in linkage we removed windows that were within 50,000 bps of the next window and this resulted in 30 windows being dropped from the dataset. This resulted in 624 remaining windows.
Windows representing single datasets
Next, we pulled the 500 windows with the highest Wattersons theta for both Z. cavirostris (Atlantic and Indo-Pacific excluding the Mediterranean) dataset and the T. truncatus (all) dataset. The rationale behind this was that they both have multiple distinct populations compared to other species datasets and we wanted to be able to pick up the population structure in both species in the SNP panel. The Mediterranean population for Z. cavirostris was exclude from this step because in preliminary analyses, the Mediterranean population almost always groups out on its own.
The 500 windows for Z. cavirsotris and 500 windows for T. truncatus were combined. The same filtering steps as previously conducted were then repeated. There were 801 of the 1000 windows that aligned uniquely to all three reference genomes with the same cutoffs as before. There were 30 windows that were within 50,000 bps of another window. This resulted in 735 remaining windows after these two filters.
Combining the panel
The 624 windows representing multiple datasets and the 735 from Z. cavirostris and T. truncatus were combined into one dataset, which resulted in 16 duplicates/overlaps and 1,343 windows remaining in the preliminary panel. Finally, windows were removed if they were within 50,000 bps of another window, resulting in 1,270 final windows. Of these windows, 29 fell in regions annotated in the reference genomes (e.g. ADGRF1 - adhesion G protein-coupled recepter F1).
| Species or group | SNPs |
|---|---|
| Ziphius cavirostris | 2067 |
| Tursiops truncatus | 1032 |
| Tursiops spp. | 906 |
| M.densirostris | 887 |
| Ziphiids (no H.ampullatus) | 747 |
| Physeter macrocephalus | 102 |
| Mysticetes | 88 |
| H,ampullatus | 12 |
Z. cavirostris (2,067)
Ecotype
ddRAD populations
M. densirostris (887)
Ecotype
ddRAD population
H. ampullatus (12)
Population
Ziphiids excluding H. ampullatus (747)
Species
Ecotype
Tursiops spp. (906)
Species
Ecotype
Tursiops truncatus (1,032)
Ecotype
Population
P. macrocephalus (102)
Population
Geographical location
Mysticetes (88)
| Species or group | SNPs |
|---|---|
| Ziphius cavirostris | 2067 |
| Tursiops truncatus | 1032 |
| Tursiops spp. | 906 |
| M.densirostris | 887 |
| Ziphiids (no H.ampullatus) | 747 |
| Physeter macrocephalus | 102 |
| Mysticetes | 88 |
| H,ampullatus | 12 |
A note on power
To do a random check on how low can you go for some more fine-scale population structure detection, we checked how many SNPs would be necessary to tease out the structure between Atlantic and Indo-Pacific Z. cavirostris. Note these were specific SNPs and now windows (as in all other analyses).